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Abstract. We propose that Solomonoff induction is complete in the 
physical sense via several strong physical arguments. We also argue that 
Solomonoff induction is fully applicable to quantum mechanics. We show 
how to choose an objective reference machine for universal induction by 
defining a physical message complexity and physical message probabil¬ 
ity, and argue that this choice dissolves some well-known objections to 
universal induction. We also introduce many more variants of physical 
message complexity based on energy and action, and discuss the ramifi¬ 
cations of our proposals. 


“If you wish to make an apple pie from scratch, you must first invent the 
universe.” - Carl Sagan 

1 Introduction 

Ray Solomonoff has discovered algorithmic probability and introduced the 
universal induction method which is the foundation of AGI theory |14j . Al¬ 
though the theory of Solomonoff induction is somewhat independent of physics, 
we interpret it physically and try to refine the understanding of the theory by 
thought experiments given constraints of physical law. First, we argue that its 
completeness is compatible with contemporary physical theory, for which we 
give arguments from modern physics that show Solomonoff induction to con¬ 
verge for all possible physical prediction problems. Second, we define a physical 
message complexity measure based on initial machine volume, and argue that 
it has the advantage of objectivity and the typical disadvantages of using low- 
level reference machines. However, we show that setting the reference machine 
to the universe does have benefits, potentially eliminating some constants from 
algorithmic information theory (AIT) and refuting certain well-known theoret¬ 
ical objections to algorithmic probability. We also introduce a physical version 
of algorithmic probability based on volume and propose six more variants of 
physical message complexity. 

2 Background 

Let us recall SolomonofFs universal distribution. Let U be a universal com¬ 
puter which runs programs with a prefix-free encoding like LISP. The algorith¬ 
mic probability that a bit string x £ {0,1} + is generated by a random program 
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7r G {0,1} + of U is: 

Pu(x) = Y, 2_W (!) 

U(tt)=x(0 | 1 )* 

We also give the basic definition of Algorithmic Information Theory (AIT), where 
the algorithmic entropy, or complexity of a bit string x G {0,1} + is defined 
as Hjj(x) = min({|7r| | U(tt) = a:}). Universal sequence induction method of 
Solomonoff works on bit strings x drawn from a stochastic source p. Equation [T| 
is a semi-measure, but that is easily overcome as we can normalize it. We merely 
normalize sequence probabilities eliminating irrelevant programs and ensuring 
that the probabilities sum to 1, from which point on P{j{x 0|x) = P{j(xO)/Py(x) 
yields an accurate prediction. The error bound for this method is the best known 
for any such induction method. The total expected squared error between P{j(x) 
and /i is less than —1/2 In P(j(n) according to the convergence theorem proven 
in m and it is roughly Hu(fi) In 2 |15] . 

3 Physical Completeness of Universal Induction 

Solomonoff induction model is known to be complete and incomputable. 
Equation [T] enumerates a non-trivial property of all programs (the membership 
of a program’s output in a regular language), which makes it an incomputable 
function. It is more properly construed as a semi-computable function that may 
be approximated arbitrarily well in the limit. Solomonoff has shown that the 
incomputability of algorithmic probability does not inhibit its practical applica¬ 
tion in any fundamental way, and emphasized this often misunderstood point in 
a number of publications. 

The only remaining assumptions for convergence theorem to hold in general, 
for any p are a) that we have picked a universal reference machine, and b) that 
fi has a computable probability density function (pdf). The second assumption 
warrants our attention when we consider modern physical theory. We formalize 
the computability of fi as follows: 

Hu(n) < k, 3k G Z (2) 

which entails that the pdf n{x) can be simulated on a computer, while x are 
(truly) stochastic. This condition is formalized likewise in [5], 

3.1 Evidence from physics 

There is an exact correspondence of such a construct in physics, which is 
the quantum wave function. The wave function of a finite quantum system is 
defined by a finite number of parameters (i.e., complex vector), although its 
product with its conjugate is a pdf from which we sample stochastic observa¬ 
tions. Since it is irrational to consider an infinite quantum system in the finite 
observable universe, fi can model the statistical behavior of matter for any quan¬ 
tum mechanical source. This is the first evidence of true, physical completeness 
of Solomonoff induction we will consider. Von Neumann entropy of a quantum 
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system is described by a density matrix p\ 


S = — tr(plnp) 


r h ln 


Vo 


(3) 
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where tr is the trace of a matrix and p = rjj \j) (j\ is decomposed into its 
eigenvectors. Apparently, von Neumann entropy is equivalent to classical entropy 
and suggests a computable pdf, which is expected since we took p to be a finite 
matrix. Furthermore, the dynamic time evolution of a wave function is known to 
be unitary, which entails that if p is a quantum system, it will remain computable 
dynamically. Therefore, if p is a quantum system with a finite density matrix, 
convergence theorem holds. 

The second piece of evidence from physical theory is that of universal quan¬ 
tum computer, which shows that any local quantum system may be simulated 
by a universal quantum computer [7]. Since a universal quantum computer is 
Turing-equivalent, this means that any local quantum system may therefore be 
simulated on a classical computer. This fact has been interpreted as a physi¬ 
cal version of Church-Turing thesis by the quantum computing pioneer David 
Deutsch, in that ’every finitely realizable physical system can be perfectly sim¬ 
ulated by a universal model computing machine operating by finite means’ 13]. 
As a quantum computer is equivalent to a probabilistic computer, whose out¬ 
puts are probabilistic after decoherence, these two facts together entail that the 
pdf of a local quantum system is always computable. Which yields our second 
conclusion. If p is a local quantum system, the convergence theorem holds. 

The third piece of evidence from physics is that of the famous Bekenstein 
bound and the holographic principle. Bekenstein bound was originally conceived 
for black holes, however, it applies to any physical system, and states that any 
finite energy system enclosed within a finite volume of space will have finite 
entropy: 


S < 


2iTkRE 

He 


(4) 


where S is entropy, and R is the radius of the sphere that encloses the system, E is 
the total energy of the system including masses, and the rest are familiar physical 
constants. Such a finite entropy readily transforms into Shannon entropy, and 
corresponds to a computable pdf. The inequality is merely a physical elucidation 
of Equation [2] Therefore, if p is a finite-size and finite-energy physical system, 
the convergence theorem holds. 

Contemporary cosmology also affirms this observation, as the entropy of the 
observable universe has been estimated, and is naturally known to be finite [1]. 
Therefore, if contemporary cosmological models are true, any physical system in 
the observable universe must have finite entropy, thus validating the convergence 
theorem. 

Thus, since we have shown wide-reaching evidence for the computability of 
pdf of p from quantum mechanics, general relativity, and cosmology, we conclude 
that contemporary physical science strongly and directly supports the universal 
applicability of the convergence theorem. In other words, it has been physically 
proven. 
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3.2 Randomness, computability and quantum mechanics 

Wood et. al interpreted algorithmic probability as a ’’universal mixture” [15] . 
which is essentially an infinite mixture of all possible computations that match 
the input. This entails that it should model even random events, due to Chaitin’s 
strong definitions of algorithmic randomness [2]. That is to say, the universal 
mixture can model white noise perfectly (e.g., fi(x 0) = p(xl) = 1/2). More ex¬ 
pansive definitions of randomness are not empirically justifiable. Our analysis is 
that stronger definitions of randomness are not needed as they would be referring 
to halting oracles, which would be truly incomputable, and by our arguments 
in this paper, have no physical relevance. Note that the halting probability is 
semi-computable. 

The computable pdf model is a good abstraction of the observations in quan¬ 
tum mechanics (QM). In QM, the wave function itself has finite description 
(finite entropy), with unitary (deterministic) evolution, while the observations 
(measurements) are stochastic. Solomonoff induction is complete with respect to 
QM, as well, even when we assume the reality of non-determinism - which many 
interpretations of QM do admit. In other words, such claims that Solomonoff in¬ 
duction is not complete could only be true if and only if either physical Church- 
Turing thesis were false, or if hypercomputers (oracle machines) were possible - 
which seem to be equivalent statements. The physical constraints on a stochastic 
source however rules out hypercomputers, which would have to contain either in¬ 
finite amount of algorithmic information (infinite memory), or be infinitely fast, 
both of which would require infinite entropy, and infinite energy. A hypercom¬ 
puter is often imagined to use a continuous model of computation which stores 
information in real-valued variables. By AIT, a random real has infinite algorith¬ 
mic entropy, which contradicts with the Bekenstein bound lEauation l3.ll) . Such 
real-valued variables are ruled out by the uncertainty principle, which places 
fundamental limits to the precision of any physical quantity - measurements 
beneath the Planck-scale are impossible. Hypercomputers are also directly ruled 
out by limits of quantum computation [6]. In other words, QM strongly supports 
the stochastic computation model of Solomonoff. 

4 On The Existence of an Objective U 

The universal induction model is seen to be subjective, since the generaliza¬ 
tion error depends on the choice of a universal computer U as the convergence 
theorem shows. This choice is natural according to a Bayesian interpretation 
of learning as U may be considered to encode the subjective knowledge of the 
observer. Furthermore, invariance theorem may be interpreted to imply that the 
choice of a reference machine is irrelevant. However, it is still an arbitrary choice. 
A previous proposal learns reference machines that have good programs short 
in the context of universal reinforcement learning HZ]. 

4.1 The universe as the reference machine 

In the following, we shall examine a sense which we may consider the best 
choice for U. Solomonoff himself mentioned such a choice HU, explaining that 
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he did find an objective universal device but dismissed it because it did not have 
any prior information, since subjectivity is a desirable and necessary feature of 
algorithmic probability. 

We proposed a philosophical solution to this problem in a previous article 
where we made a physical interpretation of algorithmic complexity, by setting U 
to the universe itself [TO]. This was achieved by adopting a physical definition of 
complexity, wherein program length was interpreted as physical length. The cor¬ 
respondence between spatial extension and program length directly follows from 
the proper physicalist account of information, for every bit extends in space. 
Which naturally gives rise to the definition of physical message complexity as 
the volume of the smallest machine that can compute a message, eliminating 
the requirement of a reference machine. There are a few difficulties with such a 
definition of complexity whose analysis is in order. Contrast also with thermo¬ 
dynamic entropy and Bennett’s work on physical complexity mm- 

4.2 Minimum machine volume as a complexity measure 

In the present article, we support the above philosophical solution to the 
choice of the reference machine with basic observations. Let us define physical 
message complexity: 

Cy{x) = min{V(M) M->r} (5) 

where x € D + is any d-ary message written in an alphabet D , M is any phys¬ 
ical machine (finite mechanism) that emits the message x (denoted M —tx), 
and V{M) is the volume of machine M. M is supposed to contain all physical 
computers that can emit message x. 

Equation [5] is too abstract and it would have to be connected to physical 
law to be useful. However, it allows us to reason about the constraints we wish 
to put on physical complexity. If we imagine what sort of device M would be, 
M is supposed to contain every possible physical computer that can emit a 
message. For this definition to be useful, the concept of emission would have 
to be determined. Imagine for now that the device emits photons that can be 
detected by a sensor, interpreting the presence of a photon with frequency /,; as 
di G D. It might be hard for us to build the minimal device that can do this. 
However, let us assume that such a device can exist and be simulated. It is likely 
that this minimal hardware would occupy quite a large volume compared to 
the output it emits. With every added unit of message complexity, the minimal 
device would have to get larger. We may consider additional complications. For 
instance, we may demand that these machines do not receive any physical input, 
i.e., supply their own energy, which we call a self-contained mechanism. We note 
that resource bounds can also be naturally put into this picture. 

When we use CV(x) instead of Hu{x), we do not only eliminate the need for a 
reference machine, but we also eliminate many constraints and constants in AIT. 
First of all, there is not the same worry of a self-delimiting program, because 
every physical machine that can be constructed will either emit a message or not 
in isolation, although its meaning slightly changes and will be considered in the 
following. Secondly, we expect all the basic theorems of AIT to hold, while the ar¬ 
bitrary constants that correspond to glue code to be eliminated or minimized. Re- 
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call that the constants in AIT usually correspond to such elementary operations 
as function composition and so forth. Let us consider the sub-additivity of infor¬ 
mation which represents a good example: Hjj(x,y ) = Hu(x) + Hu(y\x) + 0(1) 
When we consider Cy (x, y), however, the sub-additivity of information becomes 
exactly Cy(x,y) = Cy{x) + Cy(y\x) since there does not need to be a gap be¬ 
tween a machine emitting a photon and another sensing one. In the consideration 
of an underlying physical theory of computing (like quantum computing), the 
relations will further change, and become ever clearer. 


4.3 Volume based algorithmic probability 

From the viewpoint of AI theory, however, what we are interested in is 
whether the elimination of a reference machine may improve the performance 
of machine learning. Recall that the convergence theorem is related to the algo¬ 
rithmic entropy of the stochastic source with respect to the reference machine. 
A reasonable concern in this case is that the choice of a “bad” reference machine 
may inflate the errors prohibitively for small data size, for which induction works 
best, i.e., as the composition of a physical system may be poorly reflected in an 
artificial language, increasing generalization error. On the other hand, setting 
U to the universe obtains an objective measurement, which does not depend 
on subjective choices, and furthermore, always corresponds well to the actual 
physical complexity of the stochastic source. We shall first need to re-define al¬ 
gorithmic probability for an alphabet of D. We propose using the exponential 
distribution for a priori machine probabilities, which would be applicable to any 
choice of real-valued units, although we would favor Planck-units and integer 
measurements of volume. 


P(x) = 


E 


M- 


xD 


Ae -AV(M) 


E M-J-D+ 


( 6 ) 


Ae -AV-(M) 

An unbiased choice for parameter A here would be 1, however, for physical rea¬ 
sons a smaller parameter may be preferable. Here, it does not matter that any 
machine-encodings of M are prefix-free, because infinity is not a valid concern 
in physical theory, and any arrangement of quanta is possible (although not 
stable). Due to general relativity, there cannot be any influence from beyond 
the observable universe, i.e., there is not enough time for any message to arrive 
from beyond it, even if there is anything beyond the cosmic horizon. Therefore, 
the volume V(M ) of the largest machine is constrained by the volume of the 
observable universe, i.e., it is finite. Hence, the sums always converge. 


4.4 Minimum machine energy and action 

We now propose alternatives to minimum machine volume complexity. While 
volume quantifies the initial space occuppied by a machine, energy accounts for 
every aspect of operation. In general relativity, the energy distribution deter¬ 
mines the curvature of space-time, and energy is equivalent to mass via creation 
and annihilation of particle-antiparticle pairs. Likewise, the unit of h is J.sec, 
i.e., energy-time product, quantum of action and quantifies dynamical evolution 
of physical systems. Let Ce{x) = min {E(M) \ M —» x} be the energy complex¬ 
ity of message, and Ca(x) = min{A(M) | M —» x} action (or action volume 
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E.t ) complexity of message which quantify the computation and transmission 
of message i by a finite mechanism [5]. Further variants may be construed by 
considering how much energy and action it takes to build M from scratch, which 
include the work required to make the constituent quanta, and are called con¬ 
structive energy Ce c {x) and action Ca c {x) complexity of messages, respectively. 
Measures may also be defined to account for machine construction, and message 
transmission, called total energy CEt{x), and total action CAt(x) complexity of 
messages. Versions of algorithmic probability may be defined for each of these six 
new complexity measures in similar manner to Equation [6] Note that the trick in 
algorithmic probability is maximum uncertainty about the source fj,. For energy 
based probability, if /r is at thermal equilibrium we may thus use the Boltzmann 
distribution P(M ) = e~ E / kT for a priori machine probabilities instead of the 
exponential distribution, which also maximizes uncertainty. We may also model 
a priori probabilities with a canonical ensemble, using P(M) = e^ F ^ E ^ kT where 
F is the Helmholtz free energy. 

4.5 Restoring subjectivity 

Solomonoff’s observation that subjectivity is required to solve any problem of 
significant complexity is of paramount importance. Our proposal of using a phys¬ 
ical measure of complexity for objective inference does not neglect that property 
of universal induction. Instead, we observe that a guiding pdf contains prior in¬ 
formation in the form of a pdf. Let Hi be a universal computer that contains 
much prior information about a problem domain, based on a universal computer 
U that does not contain any significant information. Such prior information may 
always be split off to a memory bank. 

PuAx) = Pu(x\M) (7) 

Therefore, we can use a conditional physical message complexity given a memory 
bank to account for prior information, instead of modifying a pdf. Subjectivity 
is thus retained. Note that the universal induction view is compatible with a 
Bayesian interpretation of probability, while admitting that the source is real, 
which is why we can eliminate the bias about reference machine - there is a 
theory of everything that accurately quantifies physical processes in this universe. 

Choosing the universe as U has a particular disadvantage of using the lowest 
possible level computer architecture. Science has not yet formulated complete de¬ 
scriptions of the computation at the lowest level of the universe, therefore further 
research is needed. However, for solving problems at macro-scale, and/or from 
artificial sources, algorithmic information pertaining to such domains must be 
encoded as prior information in M, since otherwise solution would be infeasible. 

4.6 Quantum algorithmic probability and physical models 

Note that it is well possible to extend the proposal in this section to a quan¬ 
tum version of AIT by setting U to a universal quantum computer. There are 
likely other advantages of using a universal quantum computer, e.g., efficient 
simulation of physical systems. For instance, the quantum circuit model may be 
used, which seems to be closer to actual quantum physical systems than Quan¬ 
tum Turing Machine model [3]- A universal quantum computer model will also 
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extend the definition of message to any quantum measurement. In particular, we 
the input to the quantum circuit is |0 ...) (null) while the output is the quantum 
measurement of message |x). Since quantum computers are probabilistic, mul¬ 
tiple trials must be conducted to obtain the result with high probability. Also, 
Grover’s algorithm may be applied to accelerate universal induction approxima¬ 
tion procedures. 

All physical systems do reduce properly to quantum systems, however, only 
problems at the quantum-scale would require accurate simulation of quantum 
processes. An ultimate AGI system would choose the appropriate physical model 
class for the scale and domain of sensor readings it processes. Such a machine 
would be able to adjust its attention to the scale of collisions in LHC, or galaxy 
clusters according to context. This would be an important ability for an artificial 
scientist, as different physical forces are at play at different scales; nature is not 
uniformly scale-free, although some statistical properties may be invariant across 
scales. The formalism of phase spaces and stochastic dynamical systems may 
be used to describe a large number of physical systems. What matters is that 
a chosen physical formalism quantifies basic physical resources in a way that 
allows us to formulate physical complexity measures. We contend however that 
a unified language of physics is possible, in accordance with the main tenets of 
logical empiricism. 

4.7 The physical semantics of halting probability 

The halting probability fijj is the probability that a random program of U 
will halt, and it is semi-computable much like algorithmic probability. What 
happens when we set U to the universe? We observe that there is an irreducible 
mutual algorithmic information between any two stochastic sources, which is the 
physical law, or the finite set of axioms of physics (incomplete presently). This 
irreducible information corresponds to U in our framework, and it is equivalent to 
the uniformity of physical law in cosmology for which there is a wealth of evidence 
[18| . It is known that fijj contains information about difficult conjectures in 
mathematics as most can be transformed to instances of the halting problem. 
Setting U to a (sufficiently complete) theory of physics biases Qjj to encode 
the solutions of non-trivial physical problems in shorter prefixes of its binary 
expansion, while it still contains information about any other universal machines 
and problems stated within them, e.g., imaginary worlds with alternative physics. 

5 Discussion 

5.1 Dissolving the problem of induction 

The problem of induction is an old philosophical riddle that we cannot justify 
induction by itself, since that would be circular. If we follow the proposed phys¬ 
ical message complexity idea, for the first capable induction systems (brains) to 
evolve, they did not need to have an a priori, deductive proof of induction. How¬ 
ever, the evolution process itself works inductively as it proceeds from simpler to 
more complex forms which constitute and expend more physical entropy. There¬ 
fore, induction does explain how inductive systems can evolve, an explanation 
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that we might call a glorious recursion, instead of a vicious circle: an inductive 
system can invent an induction system more powerful than itself, and it can also 
invent a computational theory of how itself works when no such scientific theory 
previously existed, which is what happened in SolomonofFs brain. 

5.2 Disproving Boltzmann brains 

The argument from practical finiteness of the universe was mentioned briefly 
by Solomonoff in m- Let us note, however, that the abstract theory of algorith¬ 
mic probability implies an infinite probabilistic universe, in which every program 
may be generated, and each bit of each program is equiprobable. In such an ab¬ 
stract universe, a Boltzmann Brain, with considerably more entropy than our 
humble universe is even possible, although it has a vanishingly small probabil¬ 
ity. In a finite observable universe with finite resources, however, we obtain a 
slightly different picture, for instance any Boltzmann Brain is improbable, and 
a Boltzmann Brain with a much greater entropy than our universe would be 
impossible (0 probability). Obviously, in a sequence of universes with increasing 
volume of observable universe, the limit would be much like pure algorithmic 
probability. However, for our definition of physical message complexity, a proper 
physical framework is much more appropriate, and such considerations quickly 
veer into the territory of metaphysics (since they truly consider universes with 
physical law unlike our own). Thus firmly footed in contemporary physics, we 
gain a better understanding of the limits of ultimate intelligence. 

5.3 Refuting the Platonist objection to algorithmic information 

An additional nice property of using physical stochastic models, e.g., sta¬ 
tistical mechanics, stochastic dynamical systems, quantum computing models, 
instead of abstract machine or computation models is that we can refute a well- 
known objection to algorithmic information by Raatikainen m , which depends 
on unnatural enumerations of recursive functions, essentially constructing ref¬ 
erence machines with a lot of useless information. Such superfluous reference 
machines would incur a physical cost in physical message complexity, and there¬ 
fore they would not be picked by our definition, which is exactly why you cannot 
shuffle program indices as you like, because such permutations require additional 
information to encode. An infinite random shuffling of the indices would require 
infinite information, and impossible in the observable universe, and any substan¬ 
tial reordering would incur inordinate physical cost in a physical implementation 
of the reference machine. Raatikainen contends that his self-admittedly bizarre 
and unnatural constructions are fair play because a particular way of repre¬ 
senting the class of computable functions cannot be privileged. Better models of 
computation accurately measure time, space and energy complexities of physical 
devices, which is why they are privileged. RAM machine model is a better model 
of personal computers with von Neumann architecture than a Turing Machine, 
which is preferable to a model with no physical complexity measures. 

5.4 Concluding remarks and future work 

We have introduced the basic philosophical problems of an investigation into 
the ultimate limits of intelligence. We have covered a very wide philosophical 
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terrain of physical considerations of completeness and objective choice of ref¬ 
erence machine, and we have proposed several new kinds of physical message 
complexity and probability. Much work remains to fully connect existing body 
of physical theory to algorithmic probability. 
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